An Adaptive Algorithm for Tolerating Value Faults and Crash Failures

نویسندگان

Jennifer Ren

Michel Cukier

William H. Sanders

چکیده

The AQuA architecture provides adaptive fault tolerance to CORBA applications by replicating objects and providing a high-level method that an application can use to specify its desired level of dependability. This paper presents the algorithms that AQuA uses, when an application’s dependability requirements can change at runtime, to tolerate both value faults in applications and crash failures simultaneously. In particular, we provide an active replication communication scheme that maintains data consistency among replicas, detects crash failures, collates the messages generated by replicated objects, and delivers the result of each vote. We also present an adaptive majority voting algorithm that enables the correct ongoing vote while both the number of replicas and the majority size dynamically change. Together, these two algorithms form the basis of the mechanism for tolerating and recovering from value faults and crash failures in AQuA.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

Nowadays, faults and failures are increasing especially in complex systems such as Network-on-Chip (NoC) based Systems-on-a-Chip due to the increasing susceptibility and decreasing feature sizes. On the other hand, fault-tolerant routing algorithms have an evident effect on tolerating permanent faults and improving the reliability of a Network-on-Chip based system. This paper presents reliabili...

متن کامل

A weakly-adaptive condition-based consensus algorithm in asynchronous distributed systems

The consensus problem is one of the most fundamental and important problems for designing fault-tolerant distributed systems. In the consensus problem, each process proposes a value, and all non-faulty processes have to agree on a common value that is proposed by a process. Despite of many applications (e.g., atomic broadcast [2, 6], shared object [1, 7], weak atomic commitment [5]), it is know...

متن کامل

An Algorithm for Tolerating Crash Failures in Distributed Systems

In the framework of the ESPRIT project 28620 “TIRAN” (tailorable fault tolerance frameworks for embedded applications), a toolset of error detection, isolation, and recovery components is being designed to serve as a basic means for orchestrating application-level fault tolerance. These tools will be used either as stand-alone components or as the peripheral components of a distributed applicat...

متن کامل

One Terminal Digital Algorithm for Adaptive Single Pole Auto-Reclosing Based on Zero Sequence Voltage

This paper presents an algorithm for adaptive determination of the dead timeduring transient arcing faults and blocking automatic reclosing during permanent faults onoverhead transmission lines. The discrimination between transient and permanent faults ismade by the zero sequence voltage measured at the relay point. If the fault is recognised asan arcing one, then the third harmonic of the zero...

متن کامل

Implementing Fault-Tolerant Services Using State Machines: Beyond Replication

This paper describes a method to implement fault-tolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines uses a combination of coding theory and replication to ensure efficiency as well as savings in storage and messages during normal operations. Fused state machines may incur higher overhead during recovery from crash or Byzantine ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IEEE Trans. Parallel Distrib. Syst.

دوره 12 شماره

صفحات -

تاریخ انتشار 2001

An Adaptive Algorithm for Tolerating Value Faults and Crash Failures

نویسندگان

چکیده

منابع مشابه

Reliability and Performance Evaluation of Fault-aware Routing Methods for Network-on-Chip Architectures (RESEARCH NOTE)

A weakly-adaptive condition-based consensus algorithm in asynchronous distributed systems

An Algorithm for Tolerating Crash Failures in Distributed Systems

One Terminal Digital Algorithm for Adaptive Single Pole Auto-Reclosing Based on Zero Sequence Voltage

Implementing Fault-Tolerant Services Using State Machines: Beyond Replication

عنوان ژورنال:

اشتراک گذاری